In [1]:

    
%autosave 10









    














    



Autosaving every 10 seconds

Why

Why not use scikit-learn

Convex problem, so will always find a solution but might be a local minima, not guaranteed to be the global minima.
- Run multiple times, randomise start, pick best

If features (dimensions) have different scales
- KM assumes data well defined by centroid, and have same scales.
- !!AI so what kind of feature scaling to use?
Need to scale features
K-Means is resistant to irrelevant dimensions (no information)
- but if large variance in irrelevant dimension(s), fails (!!AI variance, or different scale? both?)
If dimension isn't Gaussian but uniform random (noise), then also breaks down.
Lesson:
- be careful when adding new features / dimensions.
- "curse of dimensionality": with too many dimensions, the actual notion of similarity breaks down. Euclidean distance with too many dimensions less useful.
If non-isotropic (not linearly separatable), i.e. no clear centroid, will fail too

Given two sets of clustering results, what is error?
Inertia: sum squared error between point and nearest centre (!!AI ?)
- But in fact this isn't useful. As you add more clusters this always decreases, but we're not penalising the increased complexity of the solution. (regularisation)
- Use if comparing two different clustering results given a priori assumption of number of clusters.
Sillhouette coefficient: considers compactness of your own cluster, in relation to distance to points not part of your cluster.
- Use to choose optimum number of clusters.

!!AI see presenter notes.

Can capture non-linearly-separatable clusters. (non-isotropic).
Estimates number of clusters.
Not restricted to Euclidean distance as distance metric
- Pass in some "similarity matrix"
Good for text.

!!AI see notes. requires solid linear algebra knowledge (maybe add references).

Looking at smallest eigenvectors, whereas PCA looks at largest eigenvectors, different problem.
Unnormalised spectral clustering tries to balance the size of the clusters
Normalised spectral clustering tries to balance the degree (density) of the clusters.
- Ratio cut graph partitioning algorithms

Time complexity?
- Not sure about analytically. But did recommendation problem with 400,000 data points on users and movies, took 30 minutes on laptop.
- You first go O(n^2) to calculate distances, then sparsify using threshold. Couldn't you just threshold first to sparsify, then calculate distances?
  - Yes, probably could.